Putting Proteins back into Water. 
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We introduce a simplified protein model where the solvent (water) degrees of freedom appear 
explicitly (although in an extremely simplified fashion). Using this model we are able to recover 
the thermodynamic phenomenology of proteins over a wide range of temperatures. In particular 
we describe both the warm and the cold protein denaturation within a single framework, while 
addressing important issues about the structure of model proteins. 
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One of the main goals of statistical physics in the last 
decade has been to understand the " folding code" : how 
the amino-acid sequence of a protein (coded in DNA, the 
"genetic code"), uniquely determines its functional ("na- 
tive" ) structure, or fold. Understanding the principles 
that drive a protein to fold to its native structure is of 
great conceptual and practical relevance, since it could 
lead, for example, to high specificity drugs. 

Proteins are extremely complex structures: they are 
long heteropolymers made of up to 20 different amino- 
acids species, each of them with its own chemical, elec- 
trostatic and steric properties; the physiological solvent, 
an aqueous solution, and its characteristics play a funda- 
mental role both in the dynamics and in the thermody- 
namics of folding. It is therefore not surprising that only 
in recent times statistical physicists have begun work- 
ing on this problem, mainly after the introduction of the 
so-called HP model , where the above mentioned rich- 
ness has been reduced to a manageable level. In the HP 
model, proteins are modeled as self-avoiding polymers 
on a lattice (two or three dimensional), greatly reducing 
the number of accessible conformations ||^. The chemi- 
cal and electrostatic properties of amino-acids have also 
been simplified: indeed, it has been recognized since long 
that the main force stabilizing the native conformations 
of globular proteins is the hydrophobicity of non-polar 
amino-acids Consequently, the important properties 
of amino-acids are reduced to two: they are either polar 
(ions or dipoles, labeled with P) or non-polar (H). 

Hydrophobicity can be described as the tendency of hy- 
drophobic molecules to reduce as much as possible their 
surface of contact with water: two hydrophobic molecules 
try to stick together in order to hide from water their 
mutual surface of contact. Consequently, hydrophobic- 
ity has been introduced in the HP model as an effective 
attractive interaction between H amino-acids. Then, the 
solvent degrees of freedom can be neglected. Here we 
show that such a simplification can be removed, and wa- 
ter can be taken into account, keeping the complexity 
of the model at a still manageable level: the benefits 
are a better description of the protein phenomenology 
(namely, cold destabilization and eventually denaturation 



|Q,||) and some insights on the structure of the protein 
core. 

In the last fifteen years there has been a growing body 
of evidence for the so called cold destabilisation of pro- 
teins: the free energy difference between denatu- 
rate and native conformations of proteins has parabolic 
shape, with a maximum at temperatures of the order of 
15 — 25°C, or lower, implying that at lower temperatures 
the native conformation is less and less stable. In some 
cases, even the cold denaturation of proteins has been 
obtained 

The HP model is unable to deal with cold destabiliza- 
tion since its low temperature state is compact and more 
and more stable down to T = 0: is a good description 
of cold destabilisation and eventually denaturation rel- 
evant for protein folding? We think that the answer is 
affirmative for at least two reasons. 

In order to describe protein folding with a simple 
model, it is important to capture the essential physics 
of the process, at the temperatures at which it takes 
place. If the stability of native conformations of pro- 
teins begins to decrease below 15 — 25°C, it is unlikely, 
at least a priori, that the physics responsible for such 
a behavior is not important around the maximal stabil- 
ity temperature, in a range relevant for in vivo protein 
folding. A further reason to believe that a good model 
for protein folding should also agree with the cold desta- 
bilisation phenomenology is that, actually, there is no 
clear-cut distinction between the physics that stabilises 
proteins, and the one that destabilises them. In both 
cases a re-analysis of the concept of hydrophobicity and 
of hydrophobic hydration is necessary. 

Already Frank and Evans identified the origin of hy- 
drophobicity in the partial ordering of water around non- 
polar molecules (such as, for example, pentane, benzene 
and some amino-acids). Water molecules tend to build 
ice-like cages around non-polar molecules. Although a 
detailed analysis of these structures is, to our knowl- 
edge, still lacking (actually recently some better under- 
standing and consensus are emerging p|-pl|), we can 
guess their energetic and entropic properties. Indeed, 
water molecules forming these cages are highly hydrogen 
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bonded, much as in ice; consequently, their formation is 
energetically favorable with respect to bulk liquid water. 
Yet, the possible molecular arrangements in the cages are 
a small number compared to all the disordered molecu- 
lar conformations typical of liquid water. The latter are 
energetically unfavorable with respect to bulk water be- 
cause water molecules fail to form hydrogen bonds with 
hydrophobic amino-acids. Therefore the free energy of 
formation of a cage {Fcage — Fno cage — AF) is a balance 
between an enthalpy gain/loss and an entropy loss/gain: 
ordered cages give an enthalpy gain {AH < 0) and an 
entropy loss {AS < 0); the scenario is the opposite for 
disordered states. All of the above arguments call for a 
model able to reproduce (at least qualitatively) such a 
rich phenomenology. 

The model we propose here borrows two of the simpli- 
fications from the HP model: proteins are still modeled 
as heteropolimers on a lattice, made of just two differ- 
ent amino-acid species: polar (P) and non-polar (H). 
Then, we put proteins back into water, every site of 
the lattice that is not occupied by the polymer is occu- 
pied by water (in general, by a group of water molecules 
that can be arranged in q states). Water is described 
using the MuUer-Lee-Graziano (MLG) two-states model 
(Fig.|l|a) [^Ijoj, that Silverstein et al. have recently shown 
to be consistent with a molecular model of the water- 
amino acid system . The energy of each H amino-acid 
depends on the states of the water sites it is in contact 
with: as a simplifying assumption (see Fig. |^b), we say 
that out of the q possible states of a water site, one can 
be singled out to be a cage conformations (labeled s = 0), 
energetically favorable with energy — J (J > 0), and the 
remaining q — 1 {s — 1, ...,q — 1) states are energetically 
unfavorable with energy K > (they represent the dis- 
ordered states of reduced hydrogen-bond coordination). 
We stress that the term ( un ) favorable is always with re- 
spect to bulk liquid water. Water sites that are not in 
contact with H amino-acids (that is, bulk water sites) do 
not contribute to the energy (whereas they would have an 
energetic description according to the MLG model, that 
yet has five free parameters, too many for a simple theo- 
retical model) . P amino-acids do not interact with water 
so that their energy is always 0: such a crude approxi- 
mation is made with the idea that hydrophobicity is the 
leading effect stabilizing the native conformation of pro- 
teins. Some better description of the water-P interaction 
would be welcome, but such ingredient is unnecessary for 
our present purposes. 

Given a protein of N amino-acids, with the sequence 
ai, a2, a^r (a^ = P or H), the energy of the protein is 
then 

^= E {-J5s,.o + K{\-5s,^o)) (1) 

<iM> 

where the sum is over the water sites that are nearest 
neighbors of some H amino-acid. Starting from (0) we 



can write the partition function of the system as 

ZAr-^Zw(C) (2) 

c 

where Zn{C) is the partition function associated to a 
single conformation C: 

Zjv(C) = {{q ~ l)e-'^^ + e^')"'^''^ (3) 

where the dependence on the water degrees of freedom 
has been explicitly calculated. ni(C) is the number of 
water sites nearest neighbors of some H amino-acid, uq 
is the number of bulk water sites. 

We deal with model proteins of length up to = 17 
on the square lattice, and compute the partition func- 
tion, and all the thermodynamic quantities and averages 
by exact enumeration of the 2155667 different conforma- 
tions. We show the results for the particular sequence 
PHPPHPPHPHPPHPPHH. We choose J = 1 (ac- 
tually, both K and the temperature T can be normalized 
with respect to J), K = 2 and q = 10^ (a better deter- 
mination of these values could come from molecular dy- 
namics and structural studies). We take the Boltzmann 
constant fcs = 1. 

In Fig.|| the specific heat Cy, and the average num- 
ber of monomer-monomer contacts, are shown. The 
low-tempcrature peak in the specific heat coincides with 
a jump of He- at lower temperatures the protein is 
swollen, and maximizes the number of water-H contacts, 
in agreement with cold denaturation. The number of 
contacts, Uc, begins decreasing coinciding with the high- 
temperature peak of the specific heat, that therefore co- 
incides with the usual warm denaturation phenomenon. 
Between Tc and there is a region where the most prob- 
able conformation is the one represented in the inset of 
Fig.^: as it can be seen, it is compact with a hydropho- 
bic core, out of reach for water (we also checked that 
this native state is unique, in that its Boltzmann weight 
is the largest above Tc). We have analyzed the behav- 
ior of different protein lengths and of different sequences, 
and we have always found the same qualitative behavior 
of Cv and Uc. Our model is therefore able to describe, 
within a single framework, both cold and warm denatu- 
ration. Moreover, it shows a native state with a mostly 
hydrophobic core. 

Although the ratio between Tc and in Fig.^ is un- 
physical, using the full MLG model it is possible to come 
closer to real values: the price to be paid is the larger 
number of parameters to adjust. In this Letter we ad- 
dress the physical principles responsible for the thermo- 
dynamic behavior of proteins on a broad range of tem- 
peratures: we believe that the differences between the 
bimodal model and the MLG model (and other possible 
more refined models) govern the details of the behavior 
more than the essential features. 



2 



We next compare the free energy, enthalpy and en- 
tropy variations of folding of our model with those from 
the literature |^,||. Indeed, such a comparison is a diffi- 
cult one, since it is hard to define what a denatured state 
is in our theoretical calculations. Therefore, as a simple 
approximation, we consider as denaturate those confor- 
mations with at most 4 monomer-monomer contacts (a 
polymer of 17 monomers over a square lattice has at most 
9 monomer- monomer contacts). The native state has 8 
monomer-monomer contacts. 

In Fig.|we show Foenaturate " FNative = Ai^^ , Ai/^ 

and TAS^. They coincide qualitatively with the ones 
from experiments . We point out the presence of two 
temperatures below and above which AF^ < 0: the de- 
natured state of our model protein is more stable than 
the native state. Between these two temperatures, in- 
stead, AF^ > 0, and the native state is the most stable. 
In the same temperature range where AF^ > 0, AH^ 
and TAS^ have a strong temperature dependence: they 
even change sign, a signature of the rich physics behind 
the water-protein system. At high temperatures we find 
that both AH^ and AS^ saturate {TAS grows linearly, 
therefore AS" saturates), as experimentally observed 
Some particular care should be paid to the low tempera- 
ture behavior of AH^ and TAS§. Indeed, AH^ goes to 
a constant value, which is consistent with a lower bound 
for the energies, and TAS^ tends to with T. Experi- 
ments should be made below Tc to assess such a behavior 
(although a recent model suggests such scenario |0). We 
find therefore that our model reproduces qualitatively the 
known calorimetric data of protein denaturation over a 
broad range of temperatures. 

The hydrophobic effect is often modeled through at- 
tractive effective HH interactions. Within our frame- 
work, we consider a system of two H amino-acids in so- 
lution and we compare the partition function of the sys- 
tem when the two amino-acids are in contact, Zc, with 
the one when the two amino-acids have no mutual con- 
tacts Zq. The effective attractive interaction is defined 
as e = T\n{Zc/ Zq) (e is positive if attractive, with this 
definition). The T —> oo limit is 

e{T ^ Qo) ^2K --{K + J) (4) 

q 

and is attractive for large values of g: it is the usual hy- 
drophobic effective interaction. Yet, the T = limit is 
e = — 2J, repulsive. A meaningful effective interaction 
should at least include such a temperature dependence. 
Actually, the strong temperature dependence of e is not 
the only limitation to a definition of an attractive HH in- 
teraction. Indeed, such an interaction can be meaningful 
only for amino-acids surrounded by water molecules, but 
it cannot be defined in the core of proteins, where wa- 
ter is absent. As a consequence, in the absence of some 
true interactions between amino-acids, the hydrophobic 
interaction alone is not able to favor thermodynamically 



the native state against different compact states obtained 
by reordering only the core of the protein. As an ex- 
ample, the two conformations in Fig.^, corresponding 
to the sequence PPHPPPPPFPPPHHHP have the 
same probability to occur in our model, since they hide 
and expose to water the same number of H amino-acids. 

Therefore this model suggests that it is improper to 
define interactions of hydrophobic origin inside proteins, 
and that the detailed structure of the cores of proteins 
should be stabilized by other mechanisms. Indeed, in the 
biochemistry literature the debate is still strong whether 
the hydrophobic interaction alone is able to enforce the 
full native state of proteins or other interactions should 
also be taken into account [|l2|. Effective interactions 
can be safely defined whenever they substitute some non- 
changing environment. When a protein is folding, its 
amino-acids find instead an ever changing environment 
that depends on water and on the other amino-acids. 
Even the reliability of two-body effective interactions vs. 
many-body ones is an open issue still to be settled. It is 
therefore intrinsically difficult to define effective poten- 
tials of some general validity between amino-acids: our 
model points out such a problem for hydrophobic inter- 
actions. 

In conclusion, we have introduced a model of proteins 
in water that is able to reproduce the known features 
of proteins (namely, cold destabilisation and warm de- 
naturation, a native state with a mostly hydrophobic 
core, and the correct free energy, enthalpy and entropy 
of folding). We also checked our results for different pro- 
tein lengths, sequences, parameter values and even imple- 
menting the full MLG model for the description of water. 
Although some details may change, the overall behavior 
is consistent and robust. Moreover, lattice models are 
intended to be only qualitatively instructive, whereas a 
quantitative description can be given only by more de- 
tailed off-lattice models. 
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FIG. 1. Bimodal effective models. Panel (a): MLG model, 
with bimodal energy distributions both for bulk and shell wa- 
ter molecules. The lower levels represent ordered group of 

water molecules, the higher levels disordered ones. The order 
of energies and of degeneracies, as obtained from experiments, 
is Eds > Edb > Eob > Eos and qda > qdb > qob > qos [ds = 
disordered shell, as = ordered shell, db = disordered bulk, ob 
= ordered bulk). Panel (b): the simplified bimodal energy 
distribution, with just two free parameters, K and q, since 
we can take J as energy scale. 




2. Specific h|eat, monomer- monomer contacts ;^nd 
number of water sites in an excjted state for the protein shown 
in the inset; J = 1, K = 2 and q = 10^. 




FIG. 3. Free energy, enthalpy and entropy (times T) dif- 
ferences between denatured conformations and the native one 
(shown in the inset of Fig. 2), for the same parameter values 
as in Fig.2. Since TAS grows linearly at high temperatures, 
AS' saturates. 
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(a) 



(b) 



FIG. 4. Two different conformations of the same sequence 
differing only for a reorganization of the core amino-acids. 



5 



